adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs
نویسندگان
چکیده
Recurrent Neural Networks (RNNs) are powerful models that achieve unparalleled performance on several pattern recognition problems. However, training of RNNs is a computationally difficult task owing to the well-known “vanishing/exploding” gradient problems. In recent years, several algorithms have been proposed for training RNNs. These algorithms either: exploit no (or limited) curvature information and have cheap per-iteration complexity; or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAM and ADAGRAD while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present an novel stochastic quasi-Newton algorithm (ADAQN) for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method is judicious in storing and retaining L-BFGS curvature pairs which is indirectly used as a means of controlling the quality of the steps. We present numerical experiments on two language modeling tasks and show that ADAQN performs at par, if not better, than popular RNN training algorithms. These results suggest that quasi-Newton algorithms have the potential to be a viable alternative to firstand second-order methods for training RNNs.
منابع مشابه
An adaptive quasi-Newton algorithm for eigensubspace estimation
In this paper, we derive and discuss a new adaptive quasi-Newton eigen-estimation algorithm and compare it with the RLS-type adaptive algorithms and the quasi-Newton algorithm proposed by Mathew et al. through experiments with stationary and nonstationary data.
متن کاملA class of multi-agent discrete hybrid non linearizable systems: Optimal controller design based on quasi-Newton algorithm for a class of sign-undefinite hessian cost functions
In the present paper, a class of hybrid, nonlinear and non linearizable dynamic systems is considered. The noted dynamic system is generalized to a multi-agent configuration. The interaction of agents is presented based on graph theory and finally, an interaction tensor defines the multi-agent system in leader-follower consensus in order to design a desirable controller for the noted system. A...
متن کاملCSLMEN: A New Optimized Method for Training Levenberg Marquardt Elman Network Based Cuckoo Search Algorithm
RNNs have local feedback loops within the network which allows them to shop earlier accessible patterns. This network can be educated with gradient descent back propagation and optimization technique such as second-order methods; conjugate gradient, quasi-Newton, Levenberg-Marquardt have also been used for networks training [14, 15]. But still this algorithm is not definite to find the global m...
متن کاملQuasi-Newton Methods for Nonconvex Constrained Multiobjective Optimization
Here, a quasi-Newton algorithm for constrained multiobjective optimization is proposed. Under suitable assumptions, global convergence of the algorithm is established.
متن کاملA limited memory adaptive trust-region approach for large-scale unconstrained optimization
This study concerns with a trust-region-based method for solving unconstrained optimization problems. The approach takes the advantages of the compact limited memory BFGS updating formula together with an appropriate adaptive radius strategy. In our approach, the adaptive technique leads us to decrease the number of subproblems solving, while utilizing the structure of limited memory quasi-Newt...
متن کامل